Skip to content

[Bugfix] - Changes in logic to delete share db #1706

Merged
dlpzx merged 11 commits intodata-dot-all:mainfrom
TejasRGitHub:GH-1633-bug-fix
Feb 4, 2025
Merged

[Bugfix] - Changes in logic to delete share db #1706
dlpzx merged 11 commits intodata-dot-all:mainfrom
TejasRGitHub:GH-1633-bug-fix

Conversation

@TejasRGitHub
Copy link
Collaborator

@TejasRGitHub TejasRGitHub commented Nov 19, 2024

Feature or Bugfix

  • Bugfix

Detail

  • Additional locking when revoking a dataset with the same database shared across various datasets
  • Modifications in checks to see if the principal / database combo is still shared else where

Relates

Security

Please answer the questions below briefly where applicable, or write N/A. Based on
OWASP 10.

  • Does this PR introduce or modify any input fields or queries - this includes
    fetching data from storage outside the application (e.g. a database, an S3 bucket)? N/A
    • Is the input sanitized?
    • What precautions are you taking before deserializing the data you consume? N/A
    • Is injection prevented by parametrizing queries?
    • Have you ensured no eval or similar functions are used?
  • Does this PR introduce any functionality or component that requires authorization? N/A
    • How have you ensured it respects the existing AuthN/AuthZ mechanisms?
    • Are you logging failed auth attempts?
  • Are you using or adding any cryptographic features? N/A
    • Do you use a standard proven implementations?
    • Are the used keys controlled by the customer? Where are they stored?
  • Are you introducing any new policies/roles/users? N/A
    • Have you used the least-privilege principle? How?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@TejasRGitHub
Copy link
Collaborator Author

TejasRGitHub commented Nov 19, 2024

Test cases:

Onboarded two dataset ( D1, D2 ) with the same common_db. Onboarded a consumption role ( C1 ). Environment has two roles ( Scientist, Engineers )

Scenerio 1:

  1. Created a share S1 with table from C1 to D1 ✅ ( Checked if the tables are accessible via Athena )
  2. Created a share S2 with table from C1 to D2 ✅ ( Checked if the tables are accessible via Athena )
  3. Revoked share S2 and checked if the common_db_shared still exists in the account ✅
  4. Queried tables from S1 to validate access and also checked if LF permissions for the principal exists ✅

Scenario 2:

  1. Repeated steps 1 and 2.
  2. Created another share S3 with D1 from environment role : Scientist ( i.e. creating share within the same environment )
  3. Checked if access to the table exists for this role ✅
  4. Now revoked S1 and check if the access in S3 and S2 are intact ✅ ( Checked that the permission to the principals exists and Athena query works)
  5. Revoked S3 and checked if the access to S2 is present ✅ ( Checked that the permission to the principals exists and Athena query works)
  6. Revoked S2 and checked now that the shared database is deleted ✅

TODO
Scenario 3:

  1. Repeat steps 1 and 2,
  2. Onboard another dataset (D3) in another enviironment where the database name is the same i.e. common_db
  3. Now make a share S1 from C1 to this dataset ( D3 ) and also make a share S2 with D2 or D1
  4. Revoke share items in share S2 and check if the common_shared db is still present ✅

Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes requested in https://github.com/data-dot-all/dataall/pull/1706/files#r1933588925 need to be fixed. I have not completed the full review because those changes might impact the whole implementation

@TejasRGitHub
Copy link
Collaborator Author

The changes requested in https://github.com/data-dot-all/dataall/pull/1706/files#r1933588925 need to be fixed. I have not completed the full review because those changes might impact the whole implementation

@dlpzx , I have made changes to fix the structure of the code.

There are lot of file which have changed when I did make lint with the newer version of ruff due to which you are seeing a lot of file which are updated. I have marked the files which are a part of this PR to make it easy to review PR

# Conflicts:
#	tests/modules/s3_datasets/test_dataset_glossary.py
Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some imports need to be removed. The rest looks good IMO.
I am testing the ruff changes in a separate PR. There are format changes that require testing

# Conflicts:
#	backend/dataall/modules/s3_datasets/aws/athena_table_client.py
#	backend/dataall/modules/s3_datasets_shares/aws/glue_client.py
#	tests/modules/s3_datasets_shares/tasks/test_s3_access_point_share_manager.py
@TejasRGitHub
Copy link
Collaborator Author

TejasRGitHub commented Jan 30, 2025

Only some imports need to be removed. The rest looks good IMO. I am testing the ruff changes in a separate PR. There are format changes that require testing

Thanks @dlpzx , removed unused imports and also synced PR branch with. the main. now the ruff formatting changes should not appear.

Also I tested out the code changes in the PR with the same tests cases as mentioned over here - #1706 (comment)

@TejasRGitHub TejasRGitHub requested a review from dlpzx January 30, 2025 17:32
@TejasRGitHub
Copy link
Collaborator Author

@dlpzx , I ran make lint and also unit test and everythings passed on my end. Acc to me, I think now this PR is also ready to be merged

Copy link
Contributor

@dlpzx dlpzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one minor comment, the rest looks good~

@TejasRGitHub
Copy link
Collaborator Author

Left one minor comment, the rest looks good~

@dlpzx , Updated PR with correction

@dlpzx dlpzx requested a review from SofiaSazonova February 4, 2025 07:18
@dlpzx dlpzx merged commit 00fcc93 into data-dot-all:main Feb 4, 2025
9 checks passed
dlpzx added a commit that referenced this pull request Feb 6, 2025
### Feature or Bugfix
- Bugfix

### Detail
In #1706 all the scenarios
tested had multiple Datasets sharing a shared-Glue database. The issue
is that when there is no common shared Glue database the lock will try
to lock an empty list of resources, resulting in timeouts in
`backend/dataall/modules/s3_datasets_shares/services/share_processors/glue_table_share_processor.py:248`

### Relates
- #1706

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@TejasRGitHub TejasRGitHub added priority: high type: bug Something isn't working labels Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority: high type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants